Dominance of Asia II 1 species of Bemisia tabaci in Pakistan and beyond

Globally, Whitefly (Bemisia tabaci) is one of the most important insect pests of crops that causes huge economical losses. The current study was designed to exclusively screen the B. tabaci species in the cotton field of Pakistan during 2017–2020 and have to conduct comparative analysis of B. tabaci species in Asia where Asia II 1 has been reported. A total of 5142 B. tabaci sequences of mitochondrial cytochrome oxidase 1 (mtCO1) from Asian countries were analyzed to determine the species and their distribution in the region. Our analysis over time and space showed that Asia II 1 has gradually dominated over Asia 1 in Punjab Province and over both Asia 1 and MEAM1 in Sindh Province. Asia has been divided into three regions i.e., South Asia (2524 sequences), Southeast Asia (757 sequences) and East Asia (1569 sequences) and dominance of different species of B. tabaci has been determined by calculating the relative percentage of each species. Interestingly, Asia II 1 has been found dominant in the neighboring region (northern zone) of India and also being dominant in its central zone. The dominance of Asia II 1 in Pakistan and northern India explains whitefly epidemic being reported in recent years.

www.nature.com/scientificreports/ diversity within the B. tabaci 8,9 . Based on esterase profiling biotypes were alphabetically designated from A to T. Some other molecular markers, mt16S ribosomal DNA, mitochondrial cytochrome oxidase 1 (mtCO1) and the nuclear ribosomal intergenic spacer 1 (ITS1) were also explored to find the genetic differences among biotypes. The mtCO1 gene is highly valuable and has been widely used for classifying haplotypes of B. tabaci and its close relatives. Based on threshold of 3.5% pairwise genetic divergence in mtCO1, the B. tabaci was resolved into complete or partial reproductively isolated groups. Lee et al. 10 suggested that the threshold of the species boundary in the B. tabaci complex is required to be replaced with 4% genetic divergence. Based on mtCO1 sequences a revised nomenclature was established which tells the geographical affiliation of genetic groups. The term biotype was abandoned and genetic groups were described as cryptic species of B. tabaci complex.
The two highly invasive whitefly species, MEAM1 and MED originated from Middle East regions, are now invading other countries of the world 11 . In Pakistan, MEAM1 is mainly found in the southern region (Sindh province) while Asia II 1 is the most prevalent species in central and north western regions. Other than these species, Asia 1, Asia II 5, Asia II 7, and Asia II 8 have also been reported from Pakistan 12 .
The current study was designed to explore the genetic diversity of B. tabaci on cotton crop with mtCO1 marker in Pakistan. The second objective was to check the genetic diversity of B. tabaci in other Asian countries including all three major Asia regions including South Asia, South East Asia, and East Asia. Moreover, recently, Asia II 1 is also reported from Syria in Middle East region so the last objective was to check the status of different species of B. tabaci in Middle East countries. The findings of this study have described the current status of different species of B. tabaci in Asian countries. This will be helpful for the development of future management strategies to control this devastating pest in the region.

B. tabaci infestation; a major constraint to cotton production in Pakistan.
In the last few years, cotton production in Pakistan is dramatically decreased due to various reasons. The data present in Fig. 1A shows the comparison of cotton production from year 2014 to 2021. There is a marked decrease in the production of cotton. In year 2014, the cotton production was 13.9 million bales which reduced to almost half i.e., 7 million bales in 2020-2021. The data present in Fig. 1B shows 2X infestation in year 2020 in comparison of year 2019, where the infestation percentage was 11.64%.
Identification of B. tabaci species from Pakistan. Samples of whitefly were collected from 2017 to 2020 from different provinces (Punjab and Sindh) of the country (Fig. 2). Collectively, we identified 80/82 sequences of Asia II 1 and 2/82 sequences of Asia II 7 species from Pakistan. No other species was detected in the current study. Asia II 7 was only detected from samples collected from Islamabad; the city in northern areas that is not the major cotton growing areas. The samples detail which was collected in the current study is summarized in Table 1.
Phylogenetic analysis of mtCO1 sequences of B. tabaci. Phylogenetic dendrogram was constructed using 82 nucleotide mtCO1 sequences of B. tabaci and B. tuberculata as an outgroup. Tree analysis showed the presence of two species of B. tabaci in the current study. No other species of B. tabaci was identified in the current study. The samples (MZ313303, MZ313304) collected from the Sindh cotton growing areas, were also showing the similarity with Asia II 1. The neighbor-joining phylogenetic tree was reconstructed to infer the relationship of the sequences identified in the current study is shown in Fig. 3A and B.
Distribution of B. tabaci in Pakistan at the province level. In total 1096 sequences were so far reported from Pakistan after removing the 5' mtCOI gene sequences and also those sequences that contain somehow wrong information (actually they belong to some other species of Bemisia genus). These 1096 sequences were phylogenetically analyzed to determine their species. There were eight species presently named as Asia II 1, MEAM1, Asia II 7, Asia 1, Asia II 5, Asia II 8, Pakistan and Pakistan 1. A single sequence (KJ709461) named as "Pakistan" species was previously reported from Islamabad 13 whereas a single sequence (GU585374) was named  www.nature.com/scientificreports/ "Pakistan-1" during the current analysis and according to GenBank submission data it was reported from Sindh province of Pakistan. The relative percentage of each species was determined and shown in the pie diagram against each province in Fig. 4. The Asia II 1 species showed the highest relative % in Balochistan, KPK and Punjab provinces where MEAM1 showed the highest relative % in Sindh province followed by Asia II 1 species. To further elaborate the situation, relative % of each species was determined with the timeline (an interval of 5 years). Distribution of B. tabaci from other countries in South Asia. In total 1166 sequences were so far reported from India after removing the 5′ mtCOI gene sequences and those sequences which contain incomplete information (they were belonging to some other species of Bemisia genus). These 1166 sequences were analyzed to determine their species. The data present in Fig. 6A shows the relative % age of B. tabaci species in India where Asia 1 is dominant over the Asia II 1, Asia II 5, Asia II 7, and Asia II 8. As India is very large country as compared to Pakistan so its zones were used for the comparison of different species in these zones. Overall, India is divided into 6 zones, out of these, B. tabaci is reported from five zones shown in Fig. 6B with the relative % age of each species reported. Like Pakistan, Asia II 1 is dominant in its neighboring zone of India i.e., Northern zone. Interestingly it is also dominant in Central zone of India.
The other countries of South Asia where B. tabaci has been reported are Bangladesh, Nepal and Afghanistan. The relative % of each species from Bangladesh and Nepal is shown in Fig. 6C and (D) respectively. From Afghanistan only two species i.e., Asia 1 and Asia II 5 with few sequences were reported (data not shown here). Moreover, a list of all sequences analyzed is provided in the supplementary file S2, Table S1.   Prevalence of Asia II 1 species in Asia. Asia II 1 species has been confirmed from 10 countries including Pakistan, India, Bangladesh, Nepal, Cambodia, Vietnam, Thailand, Taiwan, China and Syria. The list of species so far reported from each country has been summarized in supplementary file S1, Table 1. The map of the countries where Asia II 1 species has been reported is shown in Fig. 9.

Conclusion and future outlook
In the present study, it has been found that B. tabaci species, Asia II 1 is dominant in Pakistan and its spread in comparison with other species has been determined from other Asian countries. This reports the overall situation of different species of B. tabaci in the region. In the future, similar study can be designed to determine the B. tabaci species from the remaining parts of the world. That will be helpful to understand the overall global situation of different B. tabaci species. Secondly, there is a need to explore the tripartite interactions at the molecular level to deeply understand the compatible interaction of specific viruses with vectors and hosts. This deeper understanding will be help break the compatible interaction and will put a way forward to the development of resistant cultivars.

Discussion
In agriculture, species invasion is among the most important factors which drives the changes in natural equilibria and also it has been a major threat to crop production. Insect has the ability to adapt according to the environmental changes and such changes have often been due to genetic changes 14  The available sequences (till 1st June 2021) of mtCO1 gene of B. tabaci were analyzed to determine their species in these regions. However, this conclusion has been taken based on the available sequences of mtCO1 gene of B. tabaci that may have bias in collecting and submitting of the sequence. Country-wise data is provided with species and years of reporting that shows the overall situation of B. tabaci genetic variation in these countries. In Pakistan, we found the dominance of Asia II 1 in major cotton growing regions whereas in its north central region (Islamabad), Asia II 7 was also identified 15 . One interesting finding in the current analysis is that Asia 1 has disappeared since 2012 from the country. This shows the dominance of Asia II 1 over Asia 1 in Pakistan over the time. In India, Asia 1 is the most dominant species followed by Asia II 1 and others 7,16 . Recently, in a broader survey in India it has been found that Asia II 1 is dominant in its northern region that shows a situation similar to that of Pakistan. The major reason behind this dominance is probably the excessive use of insecticides and the development of insecticide resistance in Asia II 1 species. Due to the extensive use of insecticide, rapid development of resistance has been reported in Asia II 1 and Asia 1 species of B. tabaci 17 . Insecticides have been considered the mainstay in controlling the B. tabaci in agricultural production systems. In the late 1970s and 1980s, pyrethroids insecticides replaced the organophosphates (OPs) and organochlorine insecticides. But in sequences identified in the current study with the already known species of B. tabaci. Tree was rooted on the sequence of Bemisia tuberculata (AY057220) species and a total of 113 sequences were used in this analysis including 82 sequences identified in the current study and other reference sequences of some other species. The multiple sequences alignment was carried out by MUSCLE program in MEGA7 software 32 . Bootstrap method was used with 1000 replications and the percentage bootstrap value (greater than 50%) is shown on each breach. The 80 sequences identified in the current study made the group with Asia II 1 reference sequences (n = 3) and this group is collapsed at the top of the tree showing Asia II 1 (n = 82) whereas two sequences identified in the current study made the clade with Asia II 7 reference sequences. The isolates of the current study are labelled with red circle in the start of their mentioned name in the tree. (B) A neighbor-joining phylogenetic tree was reconstructed by using the current sequences and the reference sequences of only species identified in the current study and an out-group sequence mentioned above. This was included to show the relations among the sequences of the current study that was collapsed in (A). www.nature.com/scientificreports/ the late 1990s these were also replaced by neonicotinoids and other compounds. Excessive use of the same active ingredient and increased use of insecticides within a given cropping system has led to the insecticides resistance development in B. tabaci against the OPs and pyrethroids 18,19 . For example, Naveen (2016) described that Asia 1 and Asia II 1 have evolved resistance against organophosphates (Chlorpyrifos, etc.) and pyrethroids (Deltamethrin, etc.) while Asia II 7 is still susceptible against these insecticides 17 . The mechanism underlying insecticides resistance has been reported in a recent study that described the whole genome sequence of Asia II 1 from Pakistan. They stated that Asia II 1 have 1294 genes with high impact variants including 14 genes have involved in insecticides resistance 17 . In other south Asian countries like Bangladesh and Nepal, they have a similar situation like India where Asia 1 and Asia II 5 acts as a dominant species followed by Asia II 1 respectively resulting in the dominance of Asia II 1 in other countries. This seems Asia II 1 is being dominant in other regions of the world. In South East Asia, the largest number of sequences have been reported from Malaysia, where Asia 1 is dominant followed by MED and others. Interestingly, there is still no report of Asia II 1 from Malaysia whereas it has been found in some other countries in South East Asia like Cambodia, Vietnam and Thailand. In East Asia countries, the largest number of sequences have been reported from China where MED is dominant followed by MEAM1 and ~ 19 species have been found in China. Interestingly, Asia II 1 was recently reported from China where CLCuD is also recently introduced. This clearly shows the relationship of different species with specific begomoviruses diseases in a region. Japan 2 is dominant in other countries like Japan and South Korea whereas MED and MEAM1 are dominant in other countries of East Asia region. In the Middle East region MED and MEAM1 are found dominant in different countries where Asia II 1 is only reported from Syria in the region.
Why Asia II 1 became dominant in Pakistan especially in cotton regions?? There are few gestures and hypothesis by the previous reports from Pakistan. Previously, it has been reported that Asia II 1 is the vector of CLCuD causing begomoviruses/satellites in Punjab, Pakistan where its 1st and 2nd epidemics occurred and this disease has spread to Sindh and also reported from northwestern India. Recently, in a broad survey the 3rd epidemic of CLCuD has been reported from Pakistan and distinct disease complex is found associated with this disease 20,21 . The dominant strain was cotton leaf curl Multan virus (CLCuMuV-Raj) Rajasthan strain and same has been reported from India since 2015 22 . Interestingly, during a nation-wide vector-based survey of begomoviruses in Pakistan it has been found that CLCuMuV-Raj strain is the dominant strain in whitefly found from cotton regions in the country and dominant species was Asia II 1 23 than shows the co-evolution of the tripartite (hostvector-virus) in the region.
Updated record of population surge and diversity of B. tabaci species in a region is critical for developing effective approaches for whitefly control and to prevent the spread of invasive species. This study reports the two www.nature.com/scientificreports/ species Asia II 1 and Asia II 7 with mainly predominance of the Asia II 1 on cotton crop in Punjab and Sindh province of Pakistan. Asia II 1 has been remained dominant in the central region (Punjab where CLCuD is endemic) and MEAM1 have been predominantly found in the southern region (Sindh where TYLCD is endemic) for more than three decades. Asia II 1 and MEAM1 have been prevalent in the overlapping regions in Pakistan for a long time but even then, the MEAM1 could not succeed to displace indigenous species Asia II 1, as has been the global trend. There is persistence of Asia II 1 in the neighboring regions of Pakistan as well as in the northern and central regions of India as in the data shown in Fig. 6A. A recent report from India also reported the Asia II 1 abundance in the northern and central regions of the country 7 . These are basically the cotton growing areas of Pakistan and India. Ahmed et al. 24 directly compare the host suitability between the Asia II 1 and MEAM1 performed similarly across all hosts but its longevity and fecundity is high on tomato plants while Asia II 1 survived best on the cotton plants. Ahmed et al. 25 reported first time the association of CLCuD incidence with the identity and abundance of B. tabaci species and found out the high disease incidence with an abundance of Asia II 1. Then Pan et al. 26 directly compared the transmission efficiency of CLCuMuV in four cryptic species, two native species (Asia 1 and Asia II 1) and two invasive species (MEAM1 and MED) and reported that CLCuMuV is the most www.nature.com/scientificreports/ efficiently transmitted by the Asia II 1. So, might be agroecological niches which is the result of specific tripartite interaction (host, virus, and vector) is more suitable for the persistence abundance of Asia II 1 in this region. In this data, Asia 1 is not found while it has been previously reported from both Punjab and Sindh 13,25 . In 2017, updated data was only found in the Sindh region by Masood et al. 12 . But in few latest reports of the genetic diversity in Pakistan, Asia 1 is not found 15,23 . These reports align with the result of this study.  www.nature.com/scientificreports/ Because of the advancement of the high throughput a cost-effective technologies, instead of relying on a single mitochondrial gene, scientists used whole genome wide and integrated approach to accurately species identification of the B. tabaci 27 . Whole genome data of MEAM1, MED, and Asia II 1 have been available for comparative and evolutionary genomic studies 11,28,29 . These studies lay the foundation for the pan genomics studies which have the potential to explain the difference between invasive and native dominance of B. tabaci.

Materials and methods
Scenario of cotton production and sap-sucking insect infestation. The cotton production data was collected from the major cotton growing areas of Pakistan. The information was extracted from Pakistan Economic Survey, 2020-2021 (https:// www. finan ce. gov. pk/ survey/ chapt ers_ 21/ 02-Agric ulture). Similarly, all the sap sucking insect infestation data was collected from the surveys available on the website of Pest Warning and Quality Control of Pesticides (PWQC), Government of Punjab, Pakistan (http:// pestw arning. agrip unjab. gov. pk/ sucki ng-insect-pests). This data in PWQC is made available after pest scouting where pest scouting teams survey the cotton growing areas of Punjab on regular basis (40-50 random fields/week). The areas with 100 percent pest infestation were termed as hotspots.
Collection of whitefly samples. Adult whiteflies (B. tabaci) were collected from major cotton growing areas from 17 different geographical locations of Punjab, and Sindh province of Pakistan during the year 2017-2020. During the survey, the insect samples were collected randomly (in a zig-zag pattern) from the underside of the leaves using a handheld aspirator. About 20 adult whiteflies were collected from each locality and were kept in 95% ethanol and stored at -20 °C for further experiments. The survey covered 14 districts of Punjab and 2 districts of Sindh province of Pakistan. DNA extraction. Genomic DNA was extracted from 82 individual adult whiteflies. The extraction of DNA was performed using the method described by Zhang (1998) with the following manipulations 30 . Adult whitefly was blotted on filter paper to absorb water and grinded in 300 µL CTAB buffer (100 mM Tris-HCl, 20 mM EDTA, and 1.4 M NaCl consisting of 0.2% β-mercaptoethanol). This mixture after grinding was taken in a microfuge tube and incubated for 15 min at 65 °C, and then one volume of chloroform was added and mixed it manually. The mixture placed in a centrifuge and spun at 13,000 rpm for 4 min and supernatant was collected. One volume  bases) can be amplified by polymerase chain reaction (PCR) by universal primers 31 . PCR was carried out in a total volume of 20 µL reaction tube containing 10 µL DreamTaq Green PCR Master Mix (Thermo Fischer Scientific), 0.5 pmol µL −1 of each primer (forward/reverse), 2 µL DNA (20 ng µL −1 whitefly DNA), and doubledistilled water. This reaction was set up in the PCR machine (Bio-Rad, USA). The PCR conditions comprised of initial denaturation for 5 min at 94 °C, followed by 35 cycles of 94 °C for 30 s, 52 °C annealing for 1 min, 72 °C for 1 min, with a final extension of 10 min at 72 °C. The DNA was analyzed by using gel documentation system (Bio-Rad, USA). Successfully amplified products were excised and purified using Thermo GeneJET gel extraction kit (Thermo Fisher Scientific).
Cloning, sequencing, and data analyses. The purified products were directly ligated in PTZ57R/T plasmid vector (Thermo Fischer Scientific) and transformed in competent cells of Escherichia coli (TOP10 strain). Confirmed clones were selected and the sequence was determined by automated bi-directional, Sanger sequencing (Genomics and Bioinformatics Research Unit, USDA-ARS Stoneville, MS) using 3730XL ABI sequencer (Applied Biosystems, USA). The sequence reads of each clone were assembled by SeqMan in DNASTAR (Madison, WI) and a consensus sequence was saved and analyzed. First, each complete sequence was searched online by BLASTn (Basic Local Alignment Searching Tool (nucleotides) on NCBI database to identify similar sequences. For the identification of species of B. tabaci all the sequences identified in the current study were compared with the reference sequences of the known species of B. tabaci. For phylogenetic tree analysis, all these sequences were first aligned with the sequences of the reference isolates by MUSCLE in MEGA7 software 32 . A Neighbor-joining phylogenetic tree was reconstructed using all these sequences to infer the relationship of the current isolates with the previously known species.
The retrieving of mtCO1 sequences of B. tabaci. The available data of mtCO1 sequences of B. tabaci on GenBank was retrieved on 1st June 2021. A total 12,198 sequences are available under the organism's name, Bemisia tabaci with mtCO1 gene. A major list was prepared in excel and filtered for the desired information. There were some isolates in which country name was not mentioned by the sequence submitter, so for these putative sequences, submitter's country name was considered. Similarly, for many sequences sample collection www.nature.com/scientificreports/ date was missing so for that used the submission date and differentiated these dates (by bold for submission date used instead of actual collection date). For each country, sequences were retrieved based on the accession number against the name of the country in the list. The total final number of sequences from Asia used in the current study were 5142 isolates. For Pakistan, the reporting province name of each sequence was retrieved whereas, for India, zone names were also retrieved for 698 sequences.
Sequence analysis from the countries of Asia. We first filtered the data based on the country names for example for Pakistan, isolated all the sequences reported from Pakistan. Then we did their phylogenetic tree analysis to identify their species. Similarly, for other countries, their isolates were separated and analyses for the identification of their species were done. In the current study, Asian countries from three regions; South Asia, East Asia and Southeast Asia were checked for the B. tabaci presence and its species were determined for each isolate from these countries. Moreover, the countries of Middle East were also checked, as previously Asia II 1 was also reported from Syria (a country from Middle East) because south Asian countries have been good trading partner with the countries of the Middle East.